185 research outputs found

    MEGASAT: automated inference of microsatellite genotypes from sequence data

    Get PDF
    MEGASAT is software that enables genotyping of microsatellite loci using next-generation sequencing data. Microsatellites are amplified in large multiplexes, and then sequenced in pooled amplicons. MEGASAT reads sequence files and automatically scores microsatellite genotypes. It uses fuzzy matches to allow for sequencing errors and applies decision rules to account for amplification artefacts, including nontarget amplification products, replication slippage during PCR (amplification stutter) and differential amplification of alleles. An important fea- ture of MEGASAT is the generation of histograms of the length–frequency distributions of amplification products for each locus and each individual. These histograms, analogous to electropherograms traditionally used to score microsatellite genotypes, enable rapid evaluation and editing of automatically scored genotypes. MEGASAT is written in Perl, runs on Windows, Mac OS X and Linux systems, and includes a simple graphical user interface. We demon- strate MEGASAT using data from guppy, Poecilia reticulata. We genotype 1024 guppies at 43 microsatellites per run on an Illumina MiSeq sequencer. We evaluated the accuracy of automatically called genotypes using two methods, based on pedigree and repeat genotyping data, and obtained estimates of mean genotyping error rates of 0.021 and 0.012. In both estimates, three loci accounted for a disproportionate fraction of genotyping errors; conversely, 26 loci were scored with 0–1 detected error (error rate ≤0.007). Our results show that with appropriate selection of loci, automated genotyping of microsatellite loci can be achieved with very high throughput, low genotyping error and very low genotyping costs

    Parsimonious Inference of Hybridization in the Presence of Incomplete Lineage Sorting

    Get PDF
    Hybridization plays an important evolutionary role in several groups of organisms. A phylogenetic approach to detect hybridization entails sequencing multiple loci across the genomes of a group of species of interest, reconstructing their gene trees, and taking their differences as indicators of hybridization. However, methods that follow this approach mostly ignore population effects, such as incomplete lineage sorting (ILS). Given that hybridization occurs between closely related organisms, ILS may very well be at play and, hence, must be accounted for in the analysis framework. To address this issue, we present a parsimony criterion for reconciling gene trees within the branches of a phylogenetic network, and a local search heuristic for inferring phylogenetic networks from collections of gene-tree topologies under this criterion. This framework enables phylogenetic analyses while accounting for both hybridization and ILS. Further, we propose two techniques for incorporating information about uncertainty in gene-tree estimates. Our simulation studies demonstrate the good performance of our framework in terms of identifying the location of hybridization events, as well as estimating the proportions of genes that underwent hybridization. Also, our framework shows good performance in terms of efficiency on handling large data sets in our experiments. Further, in analyzing a yeast data set, we demonstrate issues that arise when analyzing real data sets. While a probabilistic approach was recently introduced for this problem, and while parsimonious reconciliations have accuracy issues under certain settings, our parsimony framework provides a much more computationally efficient technique for this type of analysis. Our framework now allows for genome-wide scans for hybridization, while also accounting for ILS

    A new, fast algorithm for detecting protein coevolution using maximum compatible cliques

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The MatrixMatchMaker algorithm was recently introduced to detect the similarity between phylogenetic trees and thus the coevolution between proteins. MMM finds the largest common submatrices between pairs of phylogenetic distance matrices, and has numerous advantages over existing methods of coevolution detection. However, these advantages came at the cost of a very long execution time.</p> <p>Results</p> <p>In this paper, we show that the problem of finding the maximum submatrix reduces to a multiple maximum clique subproblem on a graph of protein pairs. This allowed us to develop a new algorithm and program implementation, MMMvII, which achieved more than 600× speedup with comparable accuracy to the original MMM.</p> <p>Conclusions</p> <p>MMMvII will thus allow for more more extensive and intricate analyses of coevolution.</p> <p>Availability</p> <p>An implementation of the MMMvII algorithm is available at: <url>http://www.uhnresearch.ca/labs/tillier/MMMWEBvII/MMMWEBvII.php</url></p

    GRISOTTO: A greedy approach to improve combinatorial algorithms for motif discovery with prior knowledge

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Position-specific priors (PSP) have been used with success to boost EM and Gibbs sampler-based motif discovery algorithms. PSP information has been computed from different sources, including orthologous conservation, DNA duplex stability, and nucleosome positioning. The use of prior information has not yet been used in the context of combinatorial algorithms. Moreover, priors have been used only independently, and the gain of combining priors from different sources has not yet been studied.</p> <p>Results</p> <p>We extend RISOTTO, a combinatorial algorithm for motif discovery, by post-processing its output with a greedy procedure that uses prior information. PSP's from different sources are combined into a scoring criterion that guides the greedy search procedure. The resulting method, called GRISOTTO, was evaluated over 156 yeast TF ChIP-chip sequence-sets commonly used to benchmark prior-based motif discovery algorithms. Results show that GRISOTTO is at least as accurate as other twelve state-of-the-art approaches for the same task, even without combining priors. Furthermore, by considering combined priors, GRISOTTO is considerably more accurate than the state-of-the-art approaches for the same task. We also show that PSP's improve GRISOTTO ability to retrieve motifs from mouse ChiP-seq data, indicating that the proposed algorithm can be applied to data from a different technology and for a higher eukaryote.</p> <p>Conclusions</p> <p>The conclusions of this work are twofold. First, post-processing the output of combinatorial algorithms by incorporating prior information leads to a very efficient and effective motif discovery method. Second, combining priors from different sources is even more beneficial than considering them separately.</p

    Phylogenomic Analysis of Marine Roseobacters

    Get PDF
    Background: Members of the Roseobacter clade which play a key role in the biogeochemical cycles of the ocean are diverse and abundant, comprising 10–25 % of the bacterioplankton in most marine surface waters. The rapid accumulation of whole-genome sequence data for the Roseobacter clade allows us to obtain a clearer picture of its evolution. Methodology/Principal Findings: In this study about 1,200 likely orthologous protein families were identified from 17 Roseobacter bacteria genomes. Functional annotations for these genes are provided by iProClass. Phylogenetic trees were constructed for each gene using maximum likelihood (ML) and neighbor joining (NJ). Putative organismal phylogenetic trees were built with phylogenomic methods. These trees were compared and analyzed using principal coordinates analysis (PCoA), approximately unbiased (AU) and Shimodaira–Hasegawa (SH) tests. A core set of 694 genes with vertical descent signal that are resistant to horizontal gene transfer (HGT) is used to reconstruct a robust organismal phylogeny. In addition, we also discovered the most likely 109 HGT genes. The core set contains genes that encode ribosomal apparatus, ABC transporters and chaperones often found in the environmental metagenomic and metatranscriptomic data. These genes in the core set are spread out uniformly among the various functional classes and biological processes. Conclusions/Significance: Here we report a new multigene-derived phylogenetic tree of the Roseobacter clade. Of particular interest is the HGT of eleven genes involved in vitamin B12 synthesis as well as key enzynmes fo

    Reproducing the manual annotation of multiple sequence alignments using a SVM classifier

    Get PDF
    Motivation: Aligning protein sequences with the best possible accuracy requires sophisticated algorithms. Since the optimal alignment is not guaranteed to be the correct one, it is expected that even the best alignment will contain sites that do not respect the assumption of positional homology. Because formulating rules to identify these sites is difficult, it is common practice to manually remove them. Although considered necessary in some cases, manual editing is time consuming and not reproducible. We present here an automated editing method based on the classification of ‘valid’ and ‘invalid’ sites

    Phylogenetic Detection of Recombination with a Bayesian Prior on the Distance between Trees

    Get PDF
    Genomic regions participating in recombination events may support distinct topologies, and phylogenetic analyses should incorporate this heterogeneity. Existing phylogenetic methods for recombination detection are challenged by the enormous number of possible topologies, even for a moderate number of taxa. If, however, the detection analysis is conducted independently between each putative recombinant sequence and a set of reference parentals, potential recombinations between the recombinants are neglected. In this context, a recombination hotspot can be inferred in phylogenetic analyses if we observe several consecutive breakpoints. We developed a distance measure between unrooted topologies that closely resembles the number of recombinations. By introducing a prior distribution on these recombination distances, a Bayesian hierarchical model was devised to detect phylogenetic inconsistencies occurring due to recombinations. This model relaxes the assumption of known parental sequences, still common in HIV analysis, allowing the entire dataset to be analyzed at once. On simulated datasets with up to 16 taxa, our method correctly detected recombination breakpoints and the number of recombination events for each breakpoint. The procedure is robust to rate and transition∶transversion heterogeneities for simulations with and without recombination. This recombination distance is related to recombination hotspots. Applying this procedure to a genomic HIV-1 dataset, we found evidence for hotspots and de novo recombination

    Reductive Evolution of the Mitochondrial Processing Peptidases of the Unicellular Parasites Trichomonas vaginalis and Giardia intestinalis

    Get PDF
    Mitochondrial processing peptidases are heterodimeric enzymes (α/βMPP) that play an essential role in mitochondrial biogenesis by recognizing and cleaving the targeting presequences of nuclear-encoded mitochondrial proteins. The two subunits are paralogues that probably evolved by duplication of a gene for a monomeric metallopeptidase from the endosymbiotic ancestor of mitochondria. Here, we characterize the MPP-like proteins from two important human parasites that contain highly reduced versions of mitochondria, the mitosomes of Giardia intestinalis and the hydrogenosomes of Trichomonas vaginalis. Our biochemical characterization of recombinant proteins showed that, contrary to a recent report, the Trichomonas processing peptidase functions efficiently as an α/β heterodimer. By contrast, and so far uniquely among eukaryotes, the Giardia processing peptidase functions as a monomer comprising a single βMPP-like catalytic subunit. The structure and surface charge distribution of the Giardia processing peptidase predicted from a 3-D protein model appear to have co-evolved with the properties of Giardia mitosomal targeting sequences, which, unlike classic mitochondrial targeting signals, are typically short and impoverished in positively charged residues. The majority of hydrogenosomal presequences resemble those of mitosomes, but longer, positively charged mitochondrial-type presequences were also identified, consistent with the retention of the Trichomonas αMPP-like subunit. Our computational and experimental/functional analyses reveal that the divergent processing peptidases of Giardia mitosomes and Trichomonas hydrogenosomes evolved from the same ancestral heterodimeric α/βMPP metallopeptidase as did the classic mitochondrial enzyme. The unique monomeric structure of the Giardia enzyme, and the co-evolving properties of the Giardia enzyme and substrate, provide a compelling example of the power of reductive evolution to shape parasite biology
    corecore